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INTERPRETATION OF INTERACTION: A REVIEW 

By Amy Berrington de Gonzalez and D. R. Cox 

Johns Hopkins Bloomberg School of Public Health 
and Nuffield College, Oxford 

Several different types of statistical interaction are defined and 
distinguished, primarily on the basis of the nature of the factors defin- 
ing the interaction. Illustrative examples, mostly epidemiological, are 
given. The emphasis is primarily on interpretation rather than on 
methods for detecting interactions. 

1. Introduction. Interaction is one of the fundamental concepts of sta- 
tistical analysis. Establishing the presence or absence of interaction may 
be a key to correct interpretation of data. Discussion of interaction falls 
under three broad headings, namely, its definition, its detection and its in- 
terpretation. This paper is mostly devoted to the last, interpretation. Our 
illustrations are largely epidemiological; the relevance of the ideas is much 
wider. 

We consider studies in which on a number of individuals there are ob- 
served one or more response (or outcome) variables and typically several 
explanatory variables, conveniently called factors, that are thought possibly 
to influence the response. We consider initially interaction between a given 
pair of factors. From the statistical perspective, interaction is said to occur 
if the separate effects of the factors do not combine additively. That is, in- 
teraction is a particular kind of nonadditivity. The terminology is in some 
ways unfortunate in that there is no necessary implication of, say, biological 
interaction in the sense of synergism or antagonism. 

When the outcome is measured on a quantitative scale interaction on one 
scale may possibly be removed by a nonlinear transformation of the scale. For 
binary outcomes, representing say survival and death, interaction is defined 
via the nonadditivity of some function of the probability of death. When the 
probability is small, absence of interaction on the logistic scale implies that 
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to a close approximation separate explanatory variables combine their effects 
multiplicatively. From a public health perspective, it may be preferable to 
consider instead or, as well, the probability scale itself when absence of 
interaction means additivity of effect [Berkson (1958)]. An interpretation via 
probabilities is then directly in terms of differences of numbers of individuals 
at risk. 

Detection of interaction is achieved essentially by comparing the fits of 
models with and without interaction terms, or sometimes by estimation of 
defining parameters, and will hardly be discussed here; one of the main issues 
for choice, especially when one or both factors have several levels, concerns 
how general the interaction terms should be. That is, is it wise to restrict, 
initially at least, the interaction to particular patterns of effect? For a review 
of techniques for detecting interaction, see Cox (1984). 

The paper begins by making an important distinction between types of 
explanatory variables. We then discuss a very simple situation not commonly 
thought of as illustrating interaction and then discuss the interpretation of 
the main types of two-factor interaction that can arise. 

2. Types of factor. Factors, or explanatory variables, can be classified in 
various ways. First the levels of a factor may be defined by a quantitative 
variable, by an ordinal variable or the different levels may be qualitatively 
different. Examples are respectively dose level of medication, level of expo- 
sure (severe, moderate, absent) and centers (in a multi-center trial), when 
these are seen as essentially providing replication rather than as the focus 
of particular interest. 

More importantly, for our purpose, we classify factors as: 

• primary factors or what in some contexts might be called treatments or 
quasi-treatments, 

• intrinsic factors defining the study individuals, 

• nonspecific factors, representing groupings of the study individuals that 
are of no intrinsic interest but which may have nonnegligible effect on the 
response. 

This classification is strongly context-specific. 

In a randomized experiment the primary factors are those randomized 
treatments that form the focus of the study. In a comparable observational 
study they are broadly those that would have been treatments had random- 
ization been feasible. Comparison of their effect aims at a causal interpre- 
tation, although in an observational study claims of causality have to be 
approached very cautiously. Conceptually, at least, for a given study indi- 
vidual, a primary factor might have been different from the value observed; 
thus, an individual might have been randomized to a different treatment 
from that actually encountered. 
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Intrinsic factors define the study individuals, and hence usually an individ- 
ual could not have been randomized to receive a different "intrinsic factor." 
In an epidemiological context these typically include gender, socio-economic 
class, educational and family background. The role of many variables such 
as smoking status depends strongly on context; they may be a main focus of 
interest or be regarded as intrinsic. Genetic information about an individual 
may be taken as helping to define a study individual, and hence intrinsic, 
but in the study of a potentially Mendelian disease genetic information may 
be a primary factor. In the latter case we implicitly consider the question: 
what would the health status of this individual have been had this allele 
been different from how it is? 

The two-factor interactions of most interest are those in which at least one 
factor is a primary factor and there are thus three main cases to consider. 
First, however, we discuss a simpler situation which at first sight may not 
seem to involve the concept of an interaction at all. 

3. Constancy of variance. Consider a continuous response variable y 
and, for simplicity, two treatments. In the absence of further structure in the 
data, we have a two-sample problem defined implicitly by two distribution 
functions Fq{u) and F\(y) corresponding to the two treatments To and T\. 

There is then a sense in which absence of interaction implies that one 
distribution is a translation of the other F\(y) = F${y — 9). 

This interpretation hinges on the notion of unit-treatment additivity. That 
is, the response observed on a particular individual is assumed to be the sum 
of a contribution characteristic of the individual and a constant defined by 
the treatment received. Whatever may be the distribution of the individual 
characteristics, this implies the stated translational form. 

Thus, if Ti is a potential cholesterol lowering drug and To a control, 
absence of translational form would imply that on average the drug had 
a differential effect at different levels of cholesterol, on the scale in which 
cholesterol is measured. 

There are now two cases. First, if two distribution functions F\(y) and 
Fo(y) are such that as y takes values over the support of the distributions 
Fi(y) — F()(y) takes both signs, then we say the distribution functions cross. 
If the distribution functions do not cross, it may be shown that a nonlinear 
transformation of y induces translational form implying consistency with 
unit-treatment additivity on the new scale. If, on the other hand, the dis- 
tribution functions do cross, clearly no such transformation is possible. In 
the illustrative example there would at least be the implication that T\ is 
beneficial for some individuals and harmful for others. 

If the distributions are approximately normal, they are characterized by 
means (/xi,/xq) and variances (af,^) and the distribution functions do not 
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cross if and only if the variances are equal. Examination of equality of vari- 
ance is quite commonly presented as a technical statistical issue concerned 
with the validity of tests of significance. It may often be more fruitful to 
consider it a substantive issue concerning implied interaction. 

Now a normal distribution can at best be a good approximation and is 
unlikely to hold accurately in the extreme tails. Two normal distribution 
functions will cross at a probability level &(k), where 

fc = Oi -/i )/Oi -o- ), 

so that unless this is in a reasonably central part of the distribution, say, \k\ < 
2, the crossing over is unlikely to have sensible substantive interpretation. 

An approximate confidence band for the point of intersection can most 
readily be found by computing its profile likelihood function. 

4. Removable interaction. We may call an interaction removable if a 
transformation of the outcome scale can be found that induces additivity. 
The importance of this is partly that presentation of the conclusions and 
the resulting interpretation may be improved by the resulting formal sim- 
plification. It would be a mistake, however, to achieve this simplification by 
measuring effects on a scale that is very hard to understand or interpret 
[Breslow and Day (1980)]. Note also that removable interactions are incon- 
sistent with average effect reversal. For example, absence of interaction with 
gender on a transformed scale excludes the possibility that a treatment is 
on the average beneficial for men and on the average harmful for women, 
whatever the transformation of the measurement scale used. 

For a continuous and positive response variable, y, the transformations 
commonly used are logarithmic and simple powers, occasionally with a trans- 
lated origin. For binary data, the logistic or sometimes probit or complemen- 
tary log scale may be effective. While achieving additivity of effects is helpful, 
interpretability is the overriding concern. Thus, the transformation from y 
to y 1 / 3 might remove an interaction but, unless y was a representation of a 
volume, y 1 / 3 might well not be a good basis for interpretation. 

Terminology differs somewhat between fields of application; removable in- 
teractions are sometimes referred to as quantitative or ordinal interactions, 
where as nonremovable interactions are referred to as qualitative, cross-over 
or disordinal interactions [see Cronbach and Snow (1981)]. In the remain- 
der of this paper we use the terminology of quantitative and qualitative 
interactions. 

We now discuss and illustrate with examples the interpretation of the 
three main cases of interest, that is, interactions that involve a primary 
factor. 
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Table 1 

Estimated relative risks of lung cancer from Gustavsson et al. (2002) 



Asbestos exposure 


Current smoker 


Relative risk 


(95% CI) 


No 


No 


1.0 




No 


Yes 


21.7 


(14.3, 32.6) 


2.5+ fiber-years 


No 


10.2 


(2.5, 41.2) 


2.5+ fiber-years 


Yes 


43.1 


(20.1, 88.6) 



5. Examples. 

5.1. Interaction between two primary factors. 

5.1.1. Quantitative interaction. Interpretation of quantitative interac- 
tion between two primary factors is complicated by the fact that, by defi- 
nition, a quantitative interaction can be removed by transforming the scale 
of measurement. Results can be generalized more easily if the interaction 
is removed, but, as mentioned above, this should not usually be achieved 
at the expense of measuring effects on a scale that is difficult to interpret. 
Interpretation will often depend upon the aim of the investigation. 

Gustavsson et al. (2002), for example, conducted a prospective study to 
investigate whether there was evidence of interaction between exposure to 
asbestos and smoking with respect to the risk of lung cancer. They performed 
two tests for interaction between these two primary factors: one for departure 
from an additive model and one for departure from a multiplicative model 
(equivalent to testing for additivity on the log scale). The relative risks for 
each exposure group compared to those subjects who were not exposed to 
either risk factor (noncurrent smokers who were not exposed to asbestos) 
are shown in Table 1. 

The observed relative risk for the joint effect of the two risk factors (43.1) 
was significantly less than would have been expected under a multiplicative 
model (21.7 x 10.2 = 221.3), but was slightly greater than expected under 
the additive model (21.7 + 10.2 — 1 = 30.9). However, departure from the 
additive model was not statistically significant. Hence, these results could 
either be interpreted as evidence that the effects of exposure to asbestos and 
tobacco could be additive with respect to the risk of lung cancer (i.e., act in- 
dependently on this scale) or that there is a quantitative, sub-multiplicative 
interaction (i.e., they interact negatively) on a probability scale. Since bi- 
ological or other information to support one scale over the other is rarely 
available [see Siemiatycki and Thomas (1981) for an example], it is not pos- 
sible to choose between these two interpretations. 

In this example the authors' aim was not to try to elucidate biologi- 
cal mechanisms but to inform policy. In particular, they were interested in 
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Table 2 

Estimated relative risk of endometrial cancer in relation to cyclic-combined HRT use, 
according to body mass index [Beral et al. (2005) J 



HRT use 


Body mass index 


Relative risk 


(95% CI) 


ever vs never 
ever vs never 
ever vs never 


<25 kg/m 2 
25-29 kg/m 2 
30+ kg/m 2 


1.54 
1.07 
0.67 


(1.20, 1.99) 
(0.82, 1.40) 
(0.49, 0.91) 



whether special efforts should be made to help asbestos-exposed persons to 
stop smoking. Because the data were found to be consistent with the additive 
model for the joint effect of asbestos and smoking, this suggests that such a 
program is not necessary, as asbestos-exposed persons have approximately 
the same absolute increase in lung cancer risk from smoking as nonexposed 
persons. Several authors refer to this as absence of 'public health interaction' 
[Blot and Day (1979) and Rothman et al. (1980)]. 

5.1.2. Qualitative interaction. Although it could be said that qualitative 
interaction is the only 'essential' statistical interaction, because it is nonre- 
movable, if we use this approach, in practice, we would accept only effect 
reversal as evidence of interaction. Interesting and important quantitative 
interactions could therefore be over-looked. Qualitative interactions are rela- 
tively rare, but when they do occur they are usually of considerable interest. 
For example, in the Million Women UK cohort study there was evidence of 
qualitative interaction (effect reversal) between two primary factors: use of 
cyclic-combined hormone-replacement therapy (HRT) and body mass index, 
with respect to the risk of developing endometrial cancer [Beral et al. (2005)] . 
Women who were of normal body weight (body mass index < 25 kg/m 2 ) 
had a significantly increased risk of endometrial cancer if they had ever used 
this type of HRT, whereas women who were obese (body mass index of 30+ 
kg/m 2 ) had a significantly reduced risk of endometrial cancer if they had 
ever used this type of HRT compared to never users. A formal test should 
usually be performed to assess whether the qualitative interaction could be 
due to chance variation; see, for example, Azzalini and Cox (1984). 

Note that the approach used to analyze and display the data will impact 
on the interpretation. The approach of a single baseline group (Table 1) 
allows for easy examination of the consistency with different models, such as 
the additive versus the multiplicative model, but does not reveal immediately 
whether there is qualitative interaction. The opposite is true for the approach 
of multiple contingency tables (Table 2). 

5.1.3. Continuous scale interactions. Some special considerations apply 
in considering interaction between two primary factors both with levels spec- 
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ified quantitatively. An example would concern the levels of two different at- 
mospheric pollutants, the outcome being some measure of disease incidence. 
For given levels of other explanatory variables, interaction between the two 
quantitative factors with levels x\ and x 2 amounts to departure from the 
so-called generalized additive model [Hastie and Tibshirani (1990)] 

E{Y(xi,x 2 )} = oi(sci) + a 2 (x 2 ), 

where Y(xi,x 2 ) is the outcome for an individual with the specified levels of 
the explanatory variables. 

There are two broad situations. In one x± and x 2 are very different kinds 
of factors which may individually have effects on response that are quite 
complicated, but which may act virtually independently inducing additivity. 
In such a situation interaction would be tested formally by introducing a 
term, or possibly a small number of additional terms, into the model. These 
might, for example, be a simple product such as x±x 2 or possibly better, 
ai(x\) a 2 (x 2 ), where a,j(xj) is a preliminary estimate of aj(xj). 

In a contrasting situation (x±,x 2 ) are coordinates specifying points in 
a factor space and other coordinate systems may possibly be more inter- 
pretable. A notion stemming from the industrial response surface literature 
is that in the absence of quantitative background knowledge it may be best 
to think of the expected response as a function of (xi,x 2 ) that within a 
restricted region can be expanded in a Taylor series around some central 
reference level. From this perspective, if a model linear in the explanatory 
variables is inadequate, it will be sensible to add terms in (x\ , x±x 2 , of 
which the middle one represents interaction. In this context the generalized 
additive model may not be reasonable; generality of the functions dj(xj) 
combined with exclusion of product (interaction) terms would probably be 
justified only as a device for transforming the individual Xj to some relatively 
simple form for which interpretation via a first-order model is available. For 
studies of behavior near a local stationary value, use of at least second-order 
terms is needed, absence of interaction would mean that the local quadratic 
approximation had principal axes along the coordinate axes and, in general, 
there seems to be no reason to expect this. In such situations it may be 
best to abandon the main effect-interaction framework as a basis for inter- 
pretation and to concentrate on the expected response as a function to be 
estimated in some hopefully enlightening form [Box and Draper (2007)]. 

5.2. Interaction between a primary and intrinsic factor. The interpreta- 
tion of interaction between a primary and an intrinsic factor may be quite 
straightforward. A pattern of effects has to be studied to some extent sep- 
arately at the different levels of the intrinsic factor; this is sometimes also 
referred to as examination of effect-modification. Typically, if interaction 
is present, the main effect of the primary factor, while it may sometimes 



8 



A. BERRINGTON DE GONZALEZ AND D. R. COX 



Table 3 

Estimated relative risk of Parkinsons disease in relation to coffee consumption, according 

to sex [Ascherio et al. (2004)] 



Coffee consumption 


Sex 


Relative risk 


(95% CI) 


6+ vs cups/week 


males 


0.34 


(0.16, 0.75) 


6+ vs cups/week 


females 


1.09 


(0.61, 1.93) 



provide a useful qualitative synthesis, is not relevant for detailed interpre- 
tation. It involves an averaging over levels of the intrinsic factor which may 
be essentially meaningless. However, if it is found that the main effect of the 
primary factor is stable across the levels of the intrinsic factor, this implies 
that the findings are more generalizable. 

Although the statistical methods for evaluating interaction between a pri- 
mary and intrinsic factor are essentially the same as those for the evaluation 
of interaction between two primary factors, the route to interpretation is 
different, because the roles of the primary and the intrinsic factor are asym- 
metrical. 

If the intrinsic factor has quantitative levels, more elaborate models may 
aid interpretation. In these the nature of an interaction may change smoothly, 
or indeed linearly, with the level of the intrinsic variable. 

For example, Ascherio et al. (2004) found evidence that high coffee con- 
sumption was associated with a significantly reduced risk of Parkinson's 
disease for men, but there was no evidence of such an effect for women 
(Table 3). Hence, it is not appropriate to summarize these results without 
reference to sex. The average risk of Parkinson's disease from high level cof- 
fee consumption for men and women combined would be meaningless. The 
asymmetry between the primary and the intrinsic factor can be understood 
here by considering what the interpretation would be if they had presented 
the relative risk of Parkinson's disease associated with sex according to level 
of coffee consumption. This is clearly not a sensible biological viewpoint. 

5.3. Interaction between a primary and a nonspecific factor. Suppose for 
simplicity of discussion that there are two alternative treatments T and C 
and that an estimate of the treatment contrast can be found separately at 
a number of centers, these being regarded as defining nonspecific factors in 
the sense explained above. 

Two rather different situations need consideration. In one an internal es- 
timate of the precision of these individual contrasts is available, either from 
explicit replication within centers or from implicit replication, for instance, 
a reasoned assumption of binomial or Poisson variability. If the treatment 
by center interaction is appreciable and clearly statistically significant, there 
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is unexplained additional variation present affecting the primary treatment 
contrast. This should be explained if at all possible, for example, by regres- 
sion on whole-center features. 

If that is not possible, it may be unavoidable to treat the additional 
variation as random and to introduce an additional component of variance. 
The presence of this component will inflate the standard error of the pri- 
mary treatment contrast, and, unless the centers contribute essentially equal 
amounts of information, will move the weighting to be attached to the dif- 
ferent centers in the direction of equal weighting. The implicit treatment 
contrast of concern is now an average over an ensemble of repetitions. Note 
that if the degrees of freedom available to estimate this additional com- 
ponent of variance are small, estimation of it, while formally possible, is 
extremely fragile and it is likely to be wiser either simply to list estimates 
center by center or to use a sensitivity analysis of dependence on the poorly 
estimated component. 

The inclusion of an additional component of variance will typically inflate, 
possibly appreciably, the estimated standard error of the overall effect. Such 
an analysis is often described as treating centers as a random effect. This is a 
little misleading, however, in that centers are unlikely to be a random sample 
from a meaningful population. Rather, it is the unexplained interaction that 
is being modeled as generated stochastically. 

If, however, there is no effective replication within centers then the treat- 
ment by center interaction provides a base for error estimation; the simplest 
special case is the standard analysis of a randomized block design. 

Duijts et al. (2003), for example, conducted a meta-analysis of epidemio- 
logical studies of stressful life events and the risk of breast cancer. When the 
results from all eleven published epidemiological studies were combined the 
summary odds ratio for ever versus never having had a stressful life event 
was 1.77 (95% CI: 1.31 to 2.40). However, there was evidence of significant 
heterogeneity between the results from the eleven studies (i.e., interaction 
with the nonspecific factor 'study'). The authors investigated whether sev- 
eral study level primary and intrinsic factors might explain this between 
study heterogeneity. The results in Table 4 show that the summary odds 
ratios were found to vary significantly according to whether there had been 
adjustment for the key confounding factors (p < 0.001). Between the studies 
that had adjusted for the key confounding factors there was still, however, 
significant heterogeneity that could not be explained by other study level 
factors. This additional heterogeneity, having no known deterministic ex- 
planation, was then treated as random and incorporated as an additional 
component of variance using a random effects model. 

The use of the random effects model implicitly allows for the possibility 
of qualitative interactions between the primary and nonspecific factor. Some 



10 



A. BERRINGTON DE GONZALEZ AND D. R. COX 



Table 4 

Estimated summary odds ratios for breast cancer and stressful life events, according to 
confounding adjustment [Duijts et al. (2003)] 



Stressful life events 


Adjusted for key confounders? 


Odds ratio 


(95% CI) 


yes vs no 


no 


1.04 


(0.90, 1.20) 


yes vs no 


yes 


2.22 


(1.39, 3.56) 



have argued against the use of this approach, because, as noted earlier, qual- 
itative interactions should be relatively uncommon [Peto (1982)]. There are 
in any case substantial difficulties in combining studies where the supple- 
mentary variables used to adjust, say, the odds ratio, are very different for 
the distinct studies. More generally, the conceptual difficulties in treating 
replication in space or time as random were clearly set out in one of the 
earliest treatments of the summarization of evidence from repeated studies 
[Yates and Cochran (1938)]. 

5.4. Higher-order interaction. The difficulty of interpreting interactions 
increases rapidly with the number of factors involved, even if, in principle, 
the points made in connection with two-factor interactions cover many of 
the ideas needed. For example, Znaor et al. (2003) conducted a study of 
risk factors for oral cancer in Indian men. There was evidence that the joint 
effect of the three primary risk factors of interest (tobacco smoking, tobacco 
chewing and alcohol drinking) was approximately additive, but was signifi- 
cantly less than multiplicative (additive on the log-scale). Interpretation of 
the source of the sub-multiplicative three-way interaction can be aided by 
investigation of its source. Table 5 shows the odds ratios for each combi- 
nation of the three risk factors compared to those that were not exposed 
to any of the three factors. The observed odds ratio for the joint effect of 
all three risk factors (16.34) was significantly less than would have been ex- 
pected under the multiplicative model (58.14). Examination of each of the 
two-factor interactions shows that the joint effect of smoking and chewing 
tobacco was much lower than would have been expected under the multi- 
plicative model (8.53 compared to 22.71). The joint effect of smoking and 
alcohol was also slightly lower than expected (4.81 compared to 6.27), but 
the observed joint effect of chewing tobacco and alcohol was consistent with 
the expected multiplicative joint effect (24.28 compared to 23.73). Hence, 
the main source of the sub-multiplicative three-way interaction appears to 
be the sub-multiplicative two-way interaction between smoking and chewing 
tobacco, but the sub-multiplicative interaction between smoking and alco- 
hol may have contributed also. For binary data a formal test of 3 factor 
interactions in a 2 x 2 x 2 table was given by Bartlett (1935). 
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Table 5 

Odds ratio ( OR ) for interaction for combinations of smoking, chewing tobacco and 
alcohol for the risk of oral cancer [Znaor et al. (2003)] 



Smoking 


Chewing tobacco 


Alcohol 


Odds Ratio (and 95% CI) 


No 


No 


No 


1.00 (-) 


No 


Yes 


No 


9.27 (6.79-12.66) 


Yes 


No 


No 


2.45 (1.94-3.10) 


No 


No 


Yes 


2.56 (1.42-4.64) 


Yes 


Yes 


No 


8.53 (6.13-11.89) 


No 


Yes 


Yes 


24.28 (14.87-39.65) 


Yes 


No 


Yes 


4.81 (3.74-6.19) 


Yes 


Yes 


Yes 


16.34 (12.13-22.00) 



In the previous discussion we have not suggested interpretations directly 
based on the formal parameters used in representing interactions in a model, 
regarding such models as more useful for testing for interaction than for its 
interpretation. In some applications, however, the pattern of, say, two-factor 
interactions, may be of prime concern. The stability of that pattern, for 
example, over replication of a nonspecific factor is then of interest. 

An example is the study of social mobility where the primary data are 
essentially square contingency tables with the rows labeled by class of origin 
and the columns by class of destination. Interest may lie not in the changes 
in the marginal distribution between origin and destination, but rather in 
the pattern of interactions and in the stability of that pattern across time 
or countries. 

This can be represented as follows. In one study let 71"™ be the probability 
that an individual is in origin class i and destination class j. Write 

where 7Tj.,7r.,- are marginal probabilities and the ipij satisfy the appropriate 
constraints. Now suppose that there is a third factor, say, a nonspecific 
factor. When this takes level k, we write the corresponding probability Tiij-k] 
that is, for each fixed k, this defines a probability distribution over the 
corresponding square table. Then a model in which the pattern of interaction 
is essentially the same for each level of k but the magnitude of the interaction 
effect varies is represented in the form 

7Ty;fc = ^i.;k^.j;kPk' t Pij- 

This is one of a quite wide range of special models that can be considered 
for multiple contingency tables [Agresti (1990) and Goodman (1985)]. We 
do not discuss here the directly related, although conceptually different, lit- 
erature of interaction in multiple contingency tables in which the different 
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dimensions of the table are treated on an equal footing. The connection be- 
tween log linear models and additive models [Lancaster (1969) and Darroch 
1974] parallels the present discussion. 

A rather different aspect of higher-order interaction for binary observa- 
tions concerns the possible reversal of association as between marginal and 
conditional association, the Yule-Simpson effect [Yule (1903)]. A related 
issue is the possibility of spurious allelic association [Cardon and Palmer 
(2003)] where an observed dependence arises from mixing individuals, say, 
from different ethnic groups within each of which independence holds. This 
in turn is related to latent class analysis [Lazarsfeld and Henry (1968)] in 
which the aim is to represent observed multivariate dependencies by a small 
set of latent classes within each of which independence holds. A quantitative 
discussion of the modifying effect of marginalizing in this context is given 
by Cox (2003). 

6. Epistasis. We return to the relatively simple situation in which we 
concentrate on a two-way table showing the mean response at various levels 
of two factors, at least one a primary factor. Our primary route to inter- 
pretation is via the notion of the no-interaction model as a reference model 
with departures from it, if they are present, described essentially verbally. 
There are, however, other possible representations that are in a sense just 
as simple as the no-interaction model. In genetics these are described as 
epistasis; different authors use the term somewhat differently. 

Suppose, for simplicity, that there are two two-level factors specified by 
i = —1, 1 and j = —1, 1 and that the mean response at level is 

fiij = fi + ai + (3j + jij. 

Then the no-interaction model has 7 = 0. 
One simple epistatic model has 

/in = v + A, [iij = v (otherwise). 

This is a two-parameter model, as contrasted with the three parameter no- 
interaction model. Yet the epistatic model is not a special case of the no- 
interaction model. The totally null case A = is typically of no interest in 
this context and we assume that the data strongly exclude this. 

Comparison of the models is most fruitfully achieved by testing separately 
consistency with the two models, leading to the conclusion, assessed by p- 
values, that the data are consistent with one, both or neither model. 

We deal in outline with the simplest case of normally distributed data with 
equal sample sizes and constant variance but the details are not essentially 
different if, for instance, the data are represented by logistic models for 
probabilities. 
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Consistency with the no-interaction model can be tested only in effect 
by the least-squares estimate of 7 in the full model. Consistency with the 
epistatic model is tested by the mutual consistency of the three means ex- 
cluding /in leading to a variance-ratio test with upper degrees of freedom 
equal to two. Unless there is further information, such as that the two factors 
are expected to have approximately equal effects of the same sign, a = J3, 
there is no basis for extracting a single degree of freedom. 

Parallel tests based on the relevant log likelihood functions are available 
more generally. 

7. Interaction in balanced factorial designs. Historically many of the 
ideas about interaction were first formulated in detail in connection with 
randomized factorial experiments, including those of quite complicated form. 
For such factorial experiments, at least those with a continuous and approx- 
imately normally distributed outcome, the powerful technique of analysis 
of variance allows the simultaneous inspection of interactions of all orders. 
Moreover, the distinction between factors describing the structure of the 
experimental units, block factors, and those determining the randomized 
treatments corresponds to the distinction between intrinsic and nonspecific 
factors contrasted with primary factors. 

The role of analysis of variance in such contexts is partly in establishing 
via the table of degrees of freedom the logical structure of the data, and 
partly in indicating how the error to be attached to any type of contrast 
is to be estimated. This last is particularly important when treatments and 
experimental units have relatively complicated structure and lead to differ- 
ent sources of error, all based in effect on interactions between treatment 
and components of nonspecific variation. 

In the absence of special reasons to the contrary, it will be sensible to start 
the formal analysis of such data by finding the full analysis of variance table 
together with all two- and some three-way tables of means and associated 
standard errors. This involves typically calculation of interactions of many 
different orders. Significance of many interactions involving, in particular, 
an intrinsic factor often suggests splitting the data into separate sections 
on the basis of that factor, for example, analyzing male and female sections 
separately. Use of other than the full analysis of variance table, or in other 
words, pooling of terms, may be needed to enhance error estimation, but 
this is to be regarded as a second-order effect. 

The special feature of analysis by the standard normal-theory linear model 
is that the decomposition of the observational vector into orthogonal com- 
ponents, and therefore the additivity of sums of squares, typically allows 
assessment of effects of all orders virtually simultaneously. Analogous pro- 
cedures, for example, log likelihood decompositions, are available for more 
general models and unbalanced data, but are typically contingent on a full 
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model specification. That is, omission of certain terms from a model changes 
estimates of the other parameters. This tends to make an approach starting 
from a very general model with many interaction terms impracticable in 
such situations. It is the analysis strategy for detecting interactions that is 
changed rather than any issue of interpretation. 

8. Interaction detection in relatively large systems. The emphasis in this 
paper is on the interpretation of interactions rather than on their detection, 
but we now comment briefly on interaction-detection in analyses in which 
the primary emphasis is on the representation of dependency of outcome on 
a fairly large number of explanatory variables. This is often in the first place 
specified by some form of linear regression representing main effects of the 
explanatory variables, in particular, identifying those with major effects on 
the outcome. It will be essential in interpreting such relations to distinguish 
between the various kinds of explanatory factors and to ensure that the 
relation fitted is consistent with any internal structure among the primary 
explanatory variables [Cox and Wermuth (1996)]. 

Subject to that, a search for interactions among the explanatory variables, 
will often be confined to interaction involving at least one primary factor. In 
some cases it may be feasible to fit all such interactions simultaneously, as, 
for example, in the previous section. More commonly, in large observational 
studies it is likely to be preferable to fit relevant interactions as single degrees 
of freedom at a time and to make a normal probability plot from the resulting 
t statistics [Cox and Wermuth (1994)]. 

9. Ill-specified interactions. It has been implicit in the previous discus- 
sion that each interaction of potential interest can be encapsulated if not 
in a single parameter at least in a very small number. This is desirable 
for, among other reasons, incisive interpretation. This fails if, for example, 
the data are essentially, after adjustment for other effects, in the form of 
an r x c table suggesting an interaction test having (r — l)(c — 1) degrees 
of freedom. If one or both r and c are not small, the resulting procedure 
has some sensitivity against a general class of departures from additivity, 
but poor properties for specific kinds of departure which may have special 
plausibility. One route is to take an interaction defined by the product of 
scores attached separately to the rows and columns. In the absence of scores 
derived, for example, from the ordinal character of the levels, products of 
estimated main row and column effects may be used [Tukey (1949)]. See also 
Yates (1948). 

It is a matter of context whether importance lies primarily in establishing 
and interpreting interaction or in showing its effective absence. Absence 
of interpretable interaction of an important primary factor with intrinsic 
and nonspecific factors is a partial base for hope that any conclusion is 
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generalizable to new situations and applicable to specific individuals. One 
of the broad themes of the paper is that the importance of the notion of 
interaction is in no way confined to relatively complicated issues connected 
with multiple contingency tables and complex factorial experiments. 
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